AITopics | base table

Collaborating Authors

base table

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Graph-Based Feature Augmentation for Predictive Tasks on Relational Datasets

Qiao, Lianpeng, Cao, Ziqi, Feng, Kaiyu, Yuan, Ye, Wang, Guoren

arXiv.org Artificial IntelligenceAug-29-2025

Data has become a foundational asset driving innovation across domains such as finance, healthcare, and e-commerce. In these areas, predictive modeling over relational tables is commonly employed, with increasing emphasis on reducing manual effort through automated machine learning (AutoML) techniques. This raises an interesting question: can feature augmentation itself be automated and identify and utilize task-related relational signals? To address this challenge, we propose an end-to-end automated feature augmentation framework, ReCoGNN, which enhances initial datasets using features extracted from multiple relational tables to support predictive tasks. ReCoGNN first captures semantic dependencies within each table by modeling intra-table attribute relationships, enabling it to partition tables into structured, semantically coherent segments. It then constructs a heterogeneous weighted graph that represents inter-row relationships across all segments. Finally, ReCoGNN leverages message-passing graph neural networks to propagate information through the graph, guiding feature selection and augmenting the original dataset. Extensive experiments conducted on ten real-life and synthetic datasets demonstrate that ReCoGNN consistently outperforms existing methods on both classification and regression tasks.

artificial intelligence, machine learning, tuple, (16 more...)

arXiv.org Artificial Intelligence

2508.20986

Genre: Research Report > New Finding (0.93)

Industry:

Banking & Finance (0.46)
Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

FeatNavigator: Automatic Feature Augmentation on Tabular Data

Liang, Jiaming, Lei, Chuan, Qin, Xiao, Zhang, Jiani, Katsifodimos, Asterios, Faloutsos, Christos, Rangwala, Huzefa

arXiv.org Artificial IntelligenceJun-13-2024

Data-centric AI focuses on understanding and utilizing high-quality, relevant data in training machine learning (ML) models, thereby increasing the likelihood of producing accurate and useful results. Automatic feature augmentation, aiming to augment the initial base table with useful features from other tables, is critical in data preparation as it improves model performance, robustness, and generalizability. While recent works have investigated automatic feature augmentation, most of them have limited capabilities in utilizing all useful features as many of them are in candidate tables not directly joinable with the base table. Worse yet, with numerous join paths leading to these distant features, existing solutions fail to fully exploit them within a reasonable compute budget. We present FeatNavigator, an effective and efficient framework that explores and integrates high-quality features in relational tables for ML models. FeatNavigator evaluates a feature from two aspects: (1) the intrinsic value of a feature towards an ML task (i.e., feature importance) and (2) the efficacy of a join path connecting the feature to the base table (i.e., integration quality). FeatNavigator strategically selects a small set of available features and their corresponding join paths to train a feature importance estimation model and an integration quality prediction model. Furthermore, FeatNavigator's search algorithm exploits both estimated feature importance and integration quality to identify the optimized feature augmentation plan. Our experimental results show that FeatNavigator outperforms state-of-the-art solutions on five public datasets by up to 40.1% in ML model performance.

base table, featnavigator, join path, (14 more...)

arXiv.org Artificial Intelligence

2406.09534

Country:

North America > United States > District of Columbia > Washington (0.05)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Pennsylvania (0.04)
(8 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Education (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Retrieve, Merge, Predict: Augmenting Tables with Data Lakes

Cappuzzo, Riccardo, Varoquaux, Gael, Coelho, Aimee, Papotti, Paolo

arXiv.org Artificial IntelligenceFeb-9-2024

We present an in-depth analysis of data discovery in data lakes, focusing on table augmentation for given machine learning tasks. We analyze alternative methods used in the three main steps: retrieving joinable tables, merging information, and predicting with the resultant table. As data lakes, the paper uses YADL (Yet Another Data Lake) -- a novel dataset we developed as a tool for benchmarking this data discovery task -- and Open Data US, a well-referenced real data lake. Through systematic exploration on both lakes, our study outlines the importance of accurately retrieving join candidates and the efficiency of simple merging methods. We report new insights on the benefits of existing solutions and on their limitations, aiming at guiding future research in this space.

base table, containment, data lake, (17 more...)

arXiv.org Artificial Intelligence

2402.06282

Country:

Europe > France > Île-de-France > Paris > Paris (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(6 more...)

Genre: Research Report > New Finding (0.67)

Industry:

Media > Film (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimize your Amazon Redshift question efficiency with automated materialized views - Channel969

#artificialintelligenceJul-12-2022, 18:56:25 GMT

Amazon Redshift is a quick, totally managed cloud knowledge warehouse database that makes it cost-effective to investigate your knowledge utilizing normal SQL and enterprise intelligence instruments. Amazon Redshift permits you to analyze structured and semi-structured knowledge and seamlessly question knowledge lakes and operational databases, utilizing AWS designed {hardware} and automatic machine studying (ML)-based tuning to ship top-tier price-performance at scale. Though Amazon Redshift offers glorious value efficiency out of the field, it presents further optimizations that may enhance this efficiency and help you obtain even sooner question response instances out of your knowledge warehouse. For instance, you may bodily tune tables in an information mannequin to reduce the quantity of knowledge scanned and distributed inside a cluster, which accelerates operations corresponding to desk joins and range-bound scans. Amazon Redshift now automates this tuning with the computerized desk optimization (ATO) function.

efficiency, query, refresh, (16 more...)

#artificialintelligence

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (0.55)
Information Technology > Artificial Intelligence (0.34)

Add feedback

How to Split and Sample a Dataset in BigQuery Using SQL

#artificialintelligenceJun-28-2022, 04:25:08 GMT

Splitting data means that we will divide it into subsets. For data science models, datasets are usually partitioned into two or three subsets: training, validation, and test. Each subset of data has a purpose, from creating a model to ensuring its performance. To decide on the size of each subset, we often see standard rules and ratios. There have been some discussions about what an optimal split might be, but in general, I would recommend keeping in mind that not having enough data, either on the training or validation set, will result in a model that is difficult to learn/train, or you will have difficulty determining whether this model actually performs well or not. It's worth noting that you don't always have to make three segments.

dataset, subset, validation, (10 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.69)
Information Technology > Data Science (0.50)

Add feedback

Predict House Prices with Machine Learning

#artificialintelligenceNov-15-2020, 00:56:06 GMT

A lot of feature engineering rests on domain expertise. If you have a subject matter expert (SME) on real estate to provide guidance, you'll have a better chance of engineering some awesome feature that will really make your modelling shine. The following code creates these new features. The new property_age feature arguably supercedes the original tx_year and year_built, thus we'll remove them. Analytical base table: The dataset after applying all of these data cleaning steps and feature engineering is our analytical base table.

dataset, machine learning, predict house price, (4 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Predict House Prices with Machine Learning

#artificialintelligenceNov-14-2020, 17:55:14 GMT

dataset, machine learning, predict house price, (4 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Data Science > Data Quality (0.62)

Add feedback

ARDA: Automatic Relational Data Augmentation for Machine Learning

Chepurko, Nadiia, Marcus, Ryan, Zgraggen, Emanuel, Fernandez, Raul Castro, Kraska, Tim, Karger, David

arXiv.org Machine LearningMar-21-2020

Automatic machine learning (\AML) is a family of techniques to automate the process of training predictive models, aiming to both improve performance and make machine learning more accessible. While many recent works have focused on aspects of the machine learning pipeline like model selection, hyperparameter tuning, and feature selection, relatively few works have focused on automatic data augmentation. Automatic data augmentation involves finding new features relevant to the user's predictive task with minimal ``human-in-the-loop'' involvement. We present \system, an end-to-end system that takes as input a dataset and a data repository, and outputs an augmented data set such that training a predictive model on this augmented dataset results in improved performance. Our system has two distinct components: (1) a framework to search and join data with the input data, based on various attributes of the input, and (2) an efficient feature selection algorithm that prunes out noisy or irrelevant features from the resulting join. We perform an extensive empirical evaluation of different system components and benchmark our feature selection algorithm on real-world datasets.

base table, dataset, selection, (17 more...)

arXiv.org Machine Learning

2003.09758

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
(4 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.46)

Industry:

Transportation > Passenger (0.68)
Transportation > Ground > Road (0.46)
Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.46)

Add feedback